I have performed extensive testing with Microsoft TCP32 3.11, 3.11a, and recently, the 3.11b beta, with Netmanage Chameleon 4.02 and with Trumpet 2.0B. All exhibit problems in handling connection aborts. Netscape issues more of these aborts, on average, because is can have multiple connections open when the user jumps off a link or hits the stop sign. This is NOT a bug in Netscape! However, this kind of use of TCP is something that, in my opinion, has not been tested on at least some of the Windows TCP packages.
NOTE: My explanations will take the view of a client/browser and a server, but the process is more general, and needn't follow the steps I describe.
SYN
packet to the server, the server sends a SYN/ACK
back to the client ("I
saw your SYN
, and here's mine"), then the client sends an ACK
back to
the server ("OK, I saw your SYN
too, we're all set"). At this point the
connection is established.
There are 2 ways to close a connection, the "hard" way and the "soft" way. When the server sends its response and closes the connection, it must do a soft close, wherein all unsent data is delivered to the client before the socket is actually closed. On the other hand, if the client wants to abort the connection, it should do a hard close, where unsent data is trashed and the connection is closed immediately.
In TCP terms a soft close is done by one end (A) sending a FIN
message. The other end (B) sees the FIN
, but it may have more to do, so
it notes it and goes on. Eventually B is finished and sends a FIN
back
to A. Meanwhile, A is still alive responding to B so B can finish up.
When A sees B's FIN
, it knows it's all over and shuts down. B can shut
down when it sends the FIN to A. This is actually a white lie, there are
several interlock states in the TCP machine that the sockets go through
on their way down and they involve ACK
exchanges and the using program
calling closesocket()
.
A hard close is where A sends a RST
packet to B. At that point, B
says "forget it", dumps its unsent data on the floor and shuts down the
socket (again through a few intermediate states). A does the same. End
of story.
With this package, people have reported orphan socket killed n
and
WASNotSock(n)
errors in the Trumpet TCPMAN log window. The cause, in
my opinion is this: The client hard-closes the socket almost immediately
after starting the 3-way connection setup process. The server end's TCP
has not completed the connect setup, and either the listening socket or
the newly cloned "work" socket is killed before the server ever gets to
it in the first place. I believe this is what is meant by "orphan"
socket.
In the case where the listener socket itself is killed by the incoming
RST
, it's the end of the line for the server. It never again will
receive any incoming connections because the listener socket, the one
responsible for servicing incoming connection requests has been killed
by the client's RST
.
If the client's RST
comes in just a bit later, after the TCP has
converted the incoming connection to the "work" socket, but before it
completes the server's accept() call, it appears that the work socket is
"orphan kill"ed but the server gets the FD_CONNECT
message,
and tries to
do an accept(). The accept may succeed, but attempts to do I/O to the
socket fail and then the server tries to close it. Since the work socket
was bad from the start, the attempt to close it generates the WASNotSock
error.
If the RST comes in a bit later, the accept completes, the server gets an FD_CLOSED message and things terminate gracefully.
Again, see the message from Peter Tattam at the end of this paper. He's on to the problems and he's really wants to fix them.
RST
s during the connection process by
getting out of whack and leaving the work socket in the SYN_RECEIVED
state. It fails to see the RST
, and sits there waiting for the ACK
to
its part of the SYN
handshake. There is no way to get rid of the dead
socket except to reboot the system, as far as I can tell. After a while,
these dead sockets can pile up and fill the socket table. I may have
seen one instance where the listener socket gets out of whack also.
Believe me, this is a difficult scenario to diagnose for a TCP neophyte
like me.A more insidious error that Chameleon has, however, is its failure to honor the hard close call made by the browser. In this case, the browser forgets about the socket, thinking it successfully hard-closed it. Meanwhile, the server thinks the client issued a soft close and keeps sending to the client. Eventually the low-level buffers at the client end fill (the receive window closes), and the connection becomes stalled. Now the server sits there waiting for the client to start reading again, which it obviously isn't going to do. Eventually, the server's sanity timer goes off, and it closes the socket. However, on at least one TCP package (unnamed until I contact the authors) you cannot close a stalled socket and the server side gets wedged. I this is NOT the fault of the server side socket implementation, because the root of the problem is Chameleon's failure to honor the client's hard close request in the first place.
I have tested Netmanage's "Armadillo" beta, which has now been released as Chameleon 4.5. This package appears to be solid, and somewhat faster than the 4.0x versions.
It's a cruel world out there. As most of you on comp.infosystems.www.- providers know, I have spent a great deal of time trying to get the Windows Web server really reliable and capable of high performance. Keep in mind how few messages you have seen regarding the server's capabilities, adherance to HTTP, and features. Most of the traffic relates to problems traceable to the filthy hardware and software environment of the PC and the not-ready-for-prime-time TCP packages out there.
I DO NOT MEAN TO SINGLE OUT CERTAIN PACKAGES. Those were simply the
ones I had the time to test with. I am sure there are others out there
that have similar kinds of problems; For example, I do know that Novell
LAN Workplace doesn't even _support_ the SO_LINGER
socket
option which
is used to select the hard versus soft socket close.
I am a whole lot smarter than I was six months ago, but I still feel pretty
inexperienced.
REFERENCE: USENET Post from Peter Tattam, Trumpet Software.
-------- BEGIN INCLUDED MESSAGE -------- >Xref: netcom.com alt.winsock:26667 >Path: netcom.com!ix.netcom.com!howland.reston.ans.net!pipex!uunet!munnari.oz.au!newsroom.utas.edu.au!newsroom.trumpet.com.au!jimmy.trumpet.com.au!peter >Newsgroups: alt.winsock,trumpet.bugs,panix.ppp-slip.general >From: peter@trumpet.com.au (Peter R. Tattam) >Subject: Re: Trumpet Winsock 2.0B problem >Date: Wed, 21 Dec 1994 18:15:11 +1100 >Message-ID:>Lines: 80 >References: <1994Dec16.104031.5526@alder.cc.kcl.ac.uk> >Organization: Trumpet Software International Pty Ltd. >X-Newsreader: Trumpet for Windows [Version 1.0 Rev B final beta #4] [ long explanation of problem deleted (rbd) ] The problems have been resolved to the best of our knowledge. One problem was bad packets would somehow get through the IP header checksum and end up grunging memory later on. The fix has been to religiously check each packet for length, and headers for length. this is done both at the SLIP & also at the IP layer. Another problem we resolved was if a valid packet got through these sections and made it to the IP fragment reassembly code with an oversize fragment (> 1500), memory would again be grunged. Again fixed. Finally, in the beta program from 1.0A to 2.0, some of the listen/accept code was remodelled in an attempt to make the winsock pass the WSAT tests. This has introduced a number of bugs where TCP structures get lost, and along with them IP buffers...this results in tangled buffer queues and memory problems. To tell if you are hitting this bug, the telltale sign is that the IP buffers get lost, and sometimes listening sockets just won't work properly. We are confident that we have got to the root of this problem and are currently acid testing the latest release. The winsock runs fine for at least a week in an full internet server environment without any ill effects now. [edited] One program eludes us - httpd. I'm going to throw this program under the ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ microscope to see what's happening. The problem only manfests itself when Netscape kills a connection before it is accepted by httpd I think. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [Right on! I came to the same conclusion last weekend!] Of course when it comes to other apps tramping all over windows, there's not a lot we can do. The winsock data structures are required to be in shared memory, and this carries a degree of risk. If the applications are well behaved - no problems... but we know how many windows apps aren't well behaved... :-) Peter -- Peter R. Tattam - Managing Director P.Tattam@trumpet.com.au Trumpet Software International Pty Ltd. Phone: 61-02-450220 Fax: 61-02-450210
<rdenny@netcom.com>